CoCalc -- [R] Simple Linear Regression.ipynb

GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Simple Linear Regression/[R] Simple Linear Regression.ipynb
¹³⁴⁰ views

Kernel: R

In [1]:

library("IRdisplay")

In [2]:

display_png(file="img/01.png")

Out[2]:

b0 is constant representing the base salary of anyone who come to profession and have no experience i.e. Experience = 0
b1 is coefficient representing the slope. The more experience the more raise will be their in salary.

Here in the graph, the black line is Best Fitting Line

In [5]:

display_png(file="img/02.png")

Out[5]:

Actual value vs Model value and Ordinary Least Square

In [4]:

display_png(file="img/03.png")

Out[4]:

Data Preprocessing

In [22]:

# Importing the dataset
dataset = read.csv('Salary_Data.csv')

# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Salary, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)

In [23]:

training_set

Out[23]:

In [24]:

test_set

Out[24]:

Fitting Simple Linear Regression to the Training Set

In [25]:

regressor = lm(formula = Salary ~ YearsExperience, data = training_set)

In [26]:

summary(regressor)

Out[26]:

Call:
lm(formula = Salary ~ YearsExperience, data = training_set)

Residuals:
    Min      1Q  Median      3Q     Max 
-7853.2 -3691.2   904.8  3191.0  8080.8 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      27232.5     2474.3   11.01 6.17e-10 ***
YearsExperience   9103.7      392.9   23.17 6.38e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5471 on 20 degrees of freedom
Multiple R-squared:  0.9641,	Adjusted R-squared:  0.9623 
F-statistic:   537 on 1 and 20 DF,  p-value: 6.382e-16

The smaller the p-value the more significant is the Independent variable on the formula of dependent variable.

Watch this video for more information on p-value

Predicting the Test set result

In [27]:

y_pred = predict(regressor, newdata = test_set)

In [28]:

y_pred

Out[28]:

Visualising the Training set results

X = Years of Experience
Y = Salary

In [29]:

# install.packages('ggplot2')
# package to plot graph
library(ggplot2)

In [35]:

## geom for geometrical
ggplot() +
    geom_point(aes(x = training_set$YearsExperience, y = training_set$Salary),
               colour = 'red') +
    geom_line(aes(x = training_set$YearsExperience, y = predict(regressor, newdata = training_set)),
              colour = 'green') +
    ggtitle('Salary vs Experience (Training Set)') +
    xlab('Years of Experience') +
    ylab('Salary')

Out[35]:

MIME type unknown not supported

Visualising the Test set results

In [34]:

ggplot() +
    geom_point(aes(x = test_set$YearsExperience, y = test_set$Salary),
               colour = 'red') +
    geom_line(aes(x = training_set$YearsExperience, y= predict(regressor, newdata = training_set)),
              colour = 'green') +
    ggtitle('Salary vs Experience (Test Set)') +
    xlab('Years of Experience') +
    ylab('Salary')

Out[34]:

MIME type unknown not supported

Actual value vs Model value and Ordinary Least Square

Data Preprocessing

Fitting Simple Linear Regression to the Training Set

Predicting the Test set result

Visualising the Training set results

Visualising the Test set results

Product

Resources

Company